32 research outputs found

    Comparing de novo assemblers for 454 transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode <it>Litomosoides sigmodontis</it>.</p> <p>Results</p> <p>Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs.</p> <p>Conclusions</p> <p>Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.</p

    Basal Jawed Vertebrate Phylogenomics Using Transcriptomic Data from Solexa Sequencing

    Get PDF
    The traditionally accepted relationships among basal jawed vertebrates have been challenged by some molecular phylogenetic analyses based on mitochondrial sequences. Those studies split extant gnathostomes into two monophyletic groups: tetrapods and piscine branch, including Chondrichthyes, Actinopterygii and sarcopterygian fishes. Lungfish and bichir are found in a basal position on the piscine branch. Based on transcriptomes of an armored bichir (Polypterus delhezi) and an African lungfish (Protopterus sp.) we generated, expressed sequences and whole genome sequences available from public databases, we obtained 111 genes to reconstruct the phylogenetic tree of basal jawed vertebrates and estimated their times of divergence. Our phylogenomic study supports the traditional relationship. We found that gnathostomes are divided into Chondrichthyes and the Osteichthyes, both with 100% support values (posterior probabilities and bootstrap values). Chimaeras were found to have a basal position among cartilaginous fishes with a 100% support value. Osteichthyes were divided into Actinopterygii and Sarcopterygii with 100% support value. Lungfish and tetrapods form a monophyletic group with 100% posterior probability. Bichir and two teleost species form a monophyletic group with 100% support value. The previous tree, based on mitochondrial data, was significantly rejected by an approximately unbiased test (AU test, p = 0). The time of divergence between lungfish and tetrapods was estimated to be 391.8 Ma and the divergence of bichir from pufferfish and medaka was estimated to be 330.6 Ma. These estimates closely match the fossil record. In conclusion, our phylogenomic study successfully resolved the relationship of basal jawed vertebrates based on transtriptomes, EST and whole genome sequences

    De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, <it>Fagopyrum esculentum </it>and <it>F. tataricum</it>, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. <it>F. esculentum </it>(common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations <it>Fagopyrum </it>species have not been the subject of large-scale sequencing projects.</p> <p>Results</p> <p>Normalized cDNA corresponding to genes expressed in flowers and inflorescences of <it>F. esculentum </it>and <it>F. tataricum </it>was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for <it>F. esculentum</it>) and 229 (<it>F. tataricum</it>) thousands of reads with average length of 341-349 nucleotides. <it>De novo </it>assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences.</p> <p>Conclusions</p> <p>454 transcriptome sequencing and <it>de novo </it>assembly was performed for two congeneric flowering plant species, <it>F. esculentum </it>and <it>F. tataricum</it>. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.</p

    The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.</p> <p>Results</p> <p>We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses.</p> <p>Conclusions</p> <p>Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.</p> <p>[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p

    De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Arthropods are the most diverse animal phylum, but their genomic resources are relatively few. While the genome of the branchiopod <it>Daphnia pulex </it>is now available, no other large-scale crustacean genomic resources are available for comparison. In particular, genomic resources are lacking for the most tractable laboratory model of crustacean development, the amphipod <it>Parhyale hawaiensis</it>. Insight into shared and divergent characters of crustacean genomes will facilitate interpretation of future developmental, biomedical, and ecological research using crustacean models.</p> <p>Results</p> <p>To generate a transcriptome enriched for maternally provided and zygotically transcribed developmental genes, we created cDNA from ovaries and embryos of <it>P. hawaiensis</it>. Using 454 pyrosequencing, we sequenced over 1.1 billion bases of this cDNA, and assembled them <it>de novo </it>to create, to our knowledge, the second largest crustacean genomic resource to date. We found an unusually high proportion of C2H2 zinc finger-containing transcripts, as has also been reported for the genome of the pea aphid <it>Acyrthosiphon pisum</it>. Consistent with previous reports, we detected trans-spliced transcripts, but found that they did not noticeably impact transcriptome assembly. Our assembly products yielded 19,067 unique BLAST hits against <b>nr </b>(E-value cutoff e-10). These included over 400 predicted transcripts with significant similarity to <it>D. pulex </it>sequences but not to sequences of any other animal. Annotation of several hundred genes revealed <it>P. hawaiensis </it>homologues of genes involved in development, gametogenesis, and a majority of the members of six major conserved metazoan signaling pathways.</p> <p>Conclusions</p> <p>The amphipod <it>P. hawaiensis </it>has higher transcript complexity than known insect transcriptomes, and trans-splicing does not appear to be a major contributor to this complexity. We discuss the importance of a reliable comparative genomic framework within which to consider findings from new crustacean models such as <it>D. pulex </it>and <it>P. hawaiensis</it>, as well as the need for development of further substantial crustacean genomic resources.</p

    De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (<it>Pteridium aquilinum</it>) to develop genomic resources for evolutionary studies.</p> <p>Results</p> <p>681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled <it>de novo </it>into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0×. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of <it>Arabidopsis</it>, <it>Selaginella </it>and <it>Physcomitrella</it>, and identified a substantial number of potentially novel fern genes. By comparing the list of <it>Arabidopsis </it>genes identified by blast with a list of gametophyte-specific <it>Arabidopsis </it>genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.</p> <p>Conclusions</p> <p>This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for <it>de novo </it>transcriptome characterization and gene discovery in a non-model plant.</p

    Ordinal-Level Phylogenomics of the Arthropod Class Diplopoda (Millipedes) Based on an Analysis of 221 Nuclear Protein-Coding Loci Generated Using Next-Generation Sequence Analyses

    Get PDF
    Background The ancient and diverse, yet understudied arthropod class Diplopoda, the millipedes, has a muddled taxonomic history. Despite having a cosmopolitan distribution and a number of unique and interesting characteristics, the group has received relatively little attention; interest in millipede systematics is low compared to taxa of comparable diversity. The existing classification of the group comprises 16 orders. Past attempts to reconstruct millipede phylogenies have suffered from a paucity of characters and included too few taxa to confidently resolve relationships and make formal nomenclatural changes. Herein, we reconstruct an ordinal-level phylogeny for the class Diplopoda using the largest character set ever assembled for the group. Methods Transcriptomic sequences were obtained from exemplar taxa representing much of the diversity of millipede orders using second-generation (i.e., next-generation or high-throughput) sequencing. These data were subject to rigorous orthology selection and phylogenetic dataset optimization and then used to reconstruct phylogenies employing Bayesian inference and maximum likelihood optimality criteria. Ancestral reconstructions of sperm transfer appendage development (gonopods), presence of lateral defense secretion pores (ozopores), and presence of spinnerets were considered. The timings of major millipede lineage divergence points were estimated. Results The resulting phylogeny differed from the existing classifications in a number of fundamental ways. Our phylogeny includes a grouping that has never been described (Juliformia+Merocheta+Stemmiulida), and the ancestral reconstructions suggest caution with respect to using spinnerets as a unifying characteristic for the Nematophora. Our results are shown to have significantly stronger support than previous hypotheses given our data. Our efforts represent the first step toward obtaining a well-supported and robust phylogeny of the Diplopoda that can be used to answer many questions concerning the evolution of this ancient and diverse animal group

    Mol. Phylogenet. Evol.

    No full text
    In recent years, phylogenetic tree reconstructions that rely on multiple gene alignments that had been deduced from expressed sequence tags (ESTs) have become a popular method in molecular systematics. Here, we present a 454 pyrosequencing approach to infer the transcriptome of the Emperor scorpion Pandinus imperator. We obtained 428,844 high-quality reads (mean length = 223 ± 50 b) from total cDNA, which were assembled into 8334 contigs (mean length 422 ± 313 bp) and 26,147 singletons. About 1200 contigs were successfully annotated by BLAST and orthology search. Specific analyses of eight distinct hemocyanin sequences provided further proof for the quality of the 454 reads and the assembly process. The P. imperator sequences were included in a concatenated alignment of 149 orthologous genes of 67 metazoan taxa that covers 39,842 amino acids. After removal of low-quality regions, 11,168 positions were employed for phylogenetic reconstructions. Using Bayesian and maximum likelihood methods, we obtained strongly supported monophyletic Ecdysozoa, Arthropoda (excluding Tardigrada), Euarthropoda, Pancrustacea and Hexapoda. We also recovered the Myriochelata (Chelicerata + Myriapoda). Within the chelicerates, Pycnogonida form the sister group of Euchelicerata. However, Arachnida were found paraphyletic because the Acari (mites and ticks) were recovered as sister group of a clade comprising Xiphosura, Scorpiones and Araneae. In summary, we have shown that 454 pyrosequencing is a cost-effective method that provides sufficient data and coverage depth for gene detection and multigene-based phylogenetic analyses

    Mol. Phylogenet. Evol.

    No full text
    Onychophoranext term (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5â€Č cDNA sequences (expressed sequence tags, previous termESTs)next term from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10−4. Most previous termESTsnext term had the best hit with proteins from either Chordata or Arthropoda (not, vert, similar40% respectively). The previous termESTsnext term included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other previous termESTnext term projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (previous termOnychophora,next term Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The “Articulata” concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random previous termsequencingnext term of cDNAs results in sequence information suitable for previous termphylogenomicnext term approaches to resolve metazoan relationships
    corecore